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We construct a random MERA state with a bond dimension that varies with the level of the 
MERA. This causes the state to exhibit a very different entanglement structure from that usually 
seen in MERA, with neighboring intervals of length l exhibiting a mutual information proportional 
to el for some constant e, up to a length scale exponentially large in e. We express the entropy of a 
random MERA in terms of sums over cuts through the MERA network, with the entropy in this case 
controlled by the cut minimizing bond dimensions cut through. One motivation for this construction 
is to investigate the tightness of the Brandao-Horodecki [8] entropy bound relating entanglement to 
correlation decay. Using the random MERA, we show that at least part of the proof is tight: there 
do exist states with the required property of having linear mutual information between neighboring 
intervals at all length scales. We conjecture that this state has exponential correlation decay and 
that it demonstrates that the Brandao-Horodecki bound is tight (at least up to constant factors), 
and we provide some numerical evidence for this as well as a sketch of how a proof of correlation 
decay might proceed. 

PACS numbers: 


The amount of entanglement present in a quantum many-body system is closely related to the difficulty of simulating 
that system and to the existence of tensor networks to describe that system. For example, low Renyi entropy in one 
dimension implies the existence of the ability to approximate a state by a matrix product state[T|. 

In this regard, an important question has been how much entanglement can be present in a gapped system? Do 
such systems obey an area lawEj? The first general bound showing that a gap implies an area law in one dimension 
was given in Ref. [3] This initial bound gave very poor bounds on the entropy, with the upper bound scaling on 
the entropy scaling exponentially in the local Hilbert space dimension and in the inverse gap. These results were 
significantly tightened in Ref. 3] to a scaling that is linear in the inverse gap and polylogarithmic in the local Hilbert 
space dimension. 

Closely connected to this question of entanglement compared to spectral gap is the question of entanglement 
compared to correlation length. Indeed, a spectral gap for a local Hamiltonian implies exponentially decaying 
correlationsj5j so one might hope to use that correlation decay to prove an entanglement bound. At a very heuristic 
level, one might expect that if a system has correlation length £ and local Hilbert space dimension D , then any region 
A of arbitrary length will only be correlated with degrees of freedom within distance £ and will be decoupled from 
the rest of the system. Let B be the degrees of freedom within distance £ of A and let C be the rest of the system. 
Then, if A is decoupled from C, then A has entanglement entropy at most roughly £ log(D). This heuristic argument 
of course has the problem that “correlations” measure whether there are operators Oa,Oc supported on A, C such 
that ( OaOc) ~ ( Oa)(Oc) is large, while the required decoupling is that pac is close to pa pc which is a stronger 
property. 

Indeed, early evidence suggested that this heuristic argument was completely incorrect. Using quantum expandersJBJ 
[7], a family of states were constructed with exponential decay of correlation with uniformly bounded correlation 
length and fixed local Hilbert space dimension but with arbitrarily high entanglement. However, more recently in 
Ref. El Brandao and Horodecki showed that exponential correlation length decay did imply a bound on entanglement, 
apparently contradicting the previous result. The resolution of the apparent contradiction is explained in Ref. EJ To 
explain this resolution, let us first fix some notation. As in Ref. El we define the correlation function between two 
regions X , Y as 

Cor{X : Y) = mayL\\ 0x \\<i,\\o Y \\<i\^(.OxO Y p) - t?(O x p)tr(0 Y p)\, (1) 

where Ox,O y are operators supported on X , Y and ||... || denotes the operator norm. Let us say that a state has 
(£, l 0 , C)-exponential decay of correlations if for any pair of regions X , Y separated by l sites with l > Iq then 

Cor(X : Y) < C2~ 1 ^. (2) 

In Ref. El it is proven that for any connected region A', for any pure state on a sufficiently large system which has 
(£,Z 0 , l)-exponential decay of correlations, then 

S(px) < c'lo exp(clog(£)£), 


(3) 
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for some universal constants c, d > 0, where S(.. .) denotes the von Neumann entropy of a density matrix. That is, 
the state obeys an “area law” as this quantity is independent of the size of A; however, it diverges rather rapidly with 

Note that this result can also be applied to a system with (£, Iq, C)-exponential decay of correlations for C > 1: if a 
state has (£, l 0 , C)-exponential decay, then it has (£, l 0 , l)-exponential decay of correlations, with l 0 = min(Z 0 , £ log 2 (C). 

Now we can explain the resolution of the paradox: in Ref. [5J this exponential decay of correlations was assumed 
to hold for all pairs of regions X, Y. However, the quantum expander result only shows this exponential decay of 
correlations for a system on an infinite line, where A' represents an interval of sites [i,j\ and Y represents another 
interval of sites [k, l} with i < j < k < l. The quantum expander result would also show such an exponential decay 
of a system on a finite line for the same pair of intervals [i,j\ and [k,l] if * and l are sufficiently far from the left and 
right ends of the line. As noted in Ref. [9] the expander construction does give a (£, l 0 , C')-exponential decay with a C 
that is uniformly bounded above for such pairs of regions (although this bound on C was not shown in the original 
paper applying expanders to constructing many-body states), so the magnitude of the constant C is not the issue. 
However, the quantum expander result does not show correlation decay when X is an interval [i, j] and Y is the union 
of a pair of intervals [fc, 1} and [to, n] with k<l<i<j<m<n. That is, Y is on both the left and right side of X, 
rather than just being to one side. This difference in the geometry of the regions A, Y considered is the reason for 
the different result. 

So, given the Brandao-Horodecki result, we ask whether this result is tight? Is it possible to construct a family of 
states with (£, Iq, C)-exponential decay with fixed C,lo and increasing £ that have an exponential divergence of the 
entanglement with £? Rather than building a state using a random expander, we instead turn to a random MERA 
state [TUI. 

While our ultimate goal is to construct a state with large entanglement entropy and small correlations, the Brandao- 
Horodecki proof provides some clue as to how to do this. A key portion of the proof involves considering three regions, 
called C,L,R (in the proof, they actually write Bc,Bl,Br to differentiate them from other regions considered, but 
since we will only consider these three, we just write C,L,R) standing for “center”, “left”, “right”. The center region 
C is an interval of 21 sites, for some l. The left region L consists of the l sites immediately to the left of C, while 
the right region R consists of the l sites immediately to the right of C. The authors show that if for some choice 
of regions C,L,R within distance exp(l/£) of X we have I(C : LR) < el for sufficiently small e, then the entropy 
bound ([3]) follows; the authors do this using the exponential correlation decay to prove the desired area law with the 
assistance of some results from quantum information theory (the e needed for this to work depends upon and e 
is taken proportional to l/£). Then authors then show that such regions C,L,R do indeed exist: in general for any 
e > 0, then for any site s there are regions C, L, R within exp(0(l/e)) sites of s, with length l < exp(0(l/e)) such that 
I{C : LR) > el. This is shown using an adaptation of a result in Ref. [3] This is done roughly as follows: suppose that 
I(C : LR) > el for all length scales l. Any interval of a single site has entropy at most log(D), where D is the Hilbert 
space dimension on a single site. So, any interval of length 2 has entropy at most 21og(D). Applying the assumption 
I(C : LR) to C having length 21 = 2, we find that the entropy of an interval of length 4 is at most 41og(U) — e and 
hence any interval of length 8 has entropy at most 8 log(D) — 2e. Then, applying this bound to the case of C having 
length 21 = 8, we find that the entropy of an interval of length 16 is at most 16 log(H) — 4e — 4e. Iterating this, one 
eventually finds that the entropy becomes negative at some length scale exp(0(l/e)) giving a contradiction. 

Hence, while we learn from this that we can’t achieve I(C : LR) > el for all regions, in the state we are constructing 
we still would like to keep I(C : LR) large (i.e., larger than some constant times /) up to some large length scale 
(i.e., exponentially large in 1/e taking 1/e of order £) in order to construct a state with large entanglement entropy 
and small correlations as if we make I(C : LR) too small, then the area law bound will follow from that step in the 
Brandao-Horodecki proof. 

In fact, it isn’t even clear from the proof whether or not it is possible to construct even a state such that I(C : 
LR) > el for all regions C,L,R with l sufficiently small compared to exp(const./e) for some constant which may 
depend upon the Hilbert space dimension on each site. If this were not possible (for example, if one could only have 
I(C : LR) > el for l small compared to 1/e) this would immediately tighten the Brandao-Horodecki result. So, part 
of our construction will be showing that this is possible. We will in fact show a mutual information lower bound that 
implies this one: we will construct a state such that for every pair of neighboring intervals of length l < exp(l/e), 
the mutual information is lower bounded by el (in fact, since we have a random state, we will show this result in 
expectation; see the Discussion). This will have the side effect of producing mutual information between C and LR 
in the choice of intervals above: the left half of C will have mutual information with L and the right half of C will 
have mutual information with R. 

Note it is easy to construct a state such that this bound holds for very particular choices of C, L , R. That is, we can 
ensure that the bound holds for at least one choice of C , L, R at each length scale up to exp(const./e) as follows. We 
now sketch this construction. The tools we develop later to analyze the MERA construction can be used to analyze 
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this case and verify the claims in this paragraph. Construct a quantum circuit with the form of a binary tree. The 
input to the quantum circuit is a state in a 1-dimensional Hilbert space. The node at the top of the tree represents an 
isometry that maps this state to a system of two sites, each with some given dimension D\ for some D i; i.e., this is 
an isometry from a 1-dimensional Hilbert space to a Hi x Hi-dimensional Hilbert space. The two nodes at the next 
level of the tree represent further isometries, mapping each Hi-dimensional Hilbert space to a pair of ^-dimensional 
Hilbert spaces, and so on. We choose these isometries at random, and we choose the dimensions Hj, for k > 1 so 
that Hfc is slightly larger than yjDk-i] more accurately, take log(Hfc) « (1/2) log(Hfc_i) + e2 L ~ k , where L is the 
total number of levels of the tree and e is some small number. The leaves of the tree represent the final states of the 
system. Call two nodes “siblings” if they are both at the same level of the tree and have the same parents. One may 
guess then from the increase in dimension of the Hilbert space that there will be some mutual information between 
the leaves which are descendants of any given node and those which are descendants of the sibling of that node. The 
entropy of the descendants of the first node will be roughly log(Hfc) if the node is at level fc, as will be the entropy of 
the descendants of the other node, while the entropy of the combination will be only log(Hfc_i) < 2 log(Hfe). However, 
for other choices of intervals of leaves of the tree, there will be almost no mutual information. 

However, this construction only gives the mutual information for certain choices of C, L , R. We would like to have 
it holds for all choices. So, we instead use a MERA state. This is similar to the tree state, except with additional 
“disentanglers”. 

Our construction of a random MERA state has some properties that may have a holographic interpretation. See 
the Discussion. 

An important question is whether the state that we construct indeed has exponential decay of correlations. We 
conjecture that it does and we sketch how a proof of such conjecture might proceed and we provide some numerical 
evidence. However, we leave a proof of this statement for the future. 


MERA STATE 

We define the MERA network as follows. See Fig. [l] We start with a single site with a 1-dimensional Hilbert space 
(thus, up to an irrelevant choice of phase, the state of the system on this site is fixed; call this initial state V’o)- We 
then apply a series of isometries to this state, giving a new state 

ip = W L V L ...W 3 V 3 W 2 V 2 W 1 V 1 ^ 0 , (4) 

for some L, where L is the number of “levels”. The final state if) is a state on N = 2 L sites. Each 44 is an isometry 
that maps a system on 2 fc ” 1 sites with some Hilbert space dimension Dk-i on each site to a system on 2 k sites with 
some Hilbert space dimension D' k on each site. We number the sites before applying 14 by numbers 0,1, 2,..., 2 fc_1 
and after applying 14 by numbers 0,1, 2,2 fc — 1. Each 14 is a product of isometries on each of the 2 /c_1 sites, 
mapping each site to a pair of sites; the j-th site is mapped to a pair of sites 2 j, 2j + 1. Each 444 is another isometry. 
The isometry 144 preserves the number of sites, mapping a system of 2 k sites with dimension D' k on each site to a 
system of 2 k sites with dimension H& on each site. Each 444 is also a product of isometries, but in this case it is a 
product of isometries on pairs of sites; it maps each pair 2 j + l,2j + 2mod2 fe to the same pair. 

We will say that isometries 144 with smaller i are at higher levels of the MERA while those with larger i are at 
lower levels of the MERA. That is, the height of a level will increase as we move upwards in the figure. Each “level” 
of the MERA will include two rows of the figure, one with the isometry W and one with the isometry V. 

Note that pairs of sites are defined modulo 2 k in the definition of 444- If the sites are written on a line in order 
0, ...,2 k — 1, then 444 will entangle the rightmost and leftmost sites. The introduction of 444 in the definition of 4* 
above is slightly redundant, since 44 already produces entanglement between sites 0,1; however, we leave 444 in to 
keep the definition of the MERA consistent from level to level. 

We will explain the choice of dimensions H^, D k later. In a difference from traditional MERA states, the dimensions 
Hfc,H/ will be chosen differently at each level. Further, the dimension H^ will be larger than D' k . That is, the 444 
(sometimes called “disentanglers”) will have the effect of increasing the Hilbert space dimension of each site, and 
hence of the system as a whole. 

The isometries 444,44 will be chosen randomly. More precisely, each 44 is product of isometries on each of the 2 k 
sites, mapping each site to a pair of sites. Each of the isometries in this product will be chosen at random from the 
Haar uniform distribution, independently of all other isometries. Similarly, each 444 is also a product of isometries, 
each of which will again be chosen at random from the Haar uniform distribution, independently of all other isometries. 



4 



FIG. 1: Illustration of MERA network. Circle at top represents state ipo■ Isometry Vi is represented by the lines leading to 
a pair of circles below it. Isometry W\ is represented by the filled rectangle mapping that pair of circles to another pair of 
circles (note that in this case, Wi could be absorbed into a redefinition of V\ . while Wi for * > 1 cannot be absorbed into Vi). 
Isometry V 2 maps each circle in the pair to another pair of circles. Isometry W 2 maps the four sites to another four sites. The 
isometry on sites 1,2 is represented by the filled rectangle in the middle, while the isometry on sites 0,3 is represented by the 
lines leading to half a filled rectangle on left and right sides of the figure. 


ENTANGLEMENT ENTROPY OF INTERVAL 

We now estimate the entanglement entropy of an interval of sites. We start with some notation. We write [i,j] to 
denote the interval of sites i,i + 1 ,—,j — 1,/. We define ip{k) = WkVk...W±Viipo, so that ip = ip(L) and we define 
cr(fc) = |r/>(fc))(t/)(fc)|. We define (p{k) = VkWk-iVk-i-.-WiViipo and we define r(fc) = \cp(k)) (cp(k)\. We begin with an 
upper bound to the von Neumann entropy using a recurrence relation. We then derive a similar recurrence relation for 
the expectation value of the second Renyi entropy and use that to lower bound the expected von Neumann entropy. 
We then combine these bounds to get an estimate on the expected entropy of an interval. These general bounds will 
hold for any sufficiently large choice of Dk,D' k \ we then specialize to a particular choice to obtain the desired state 
with large entanglement. 


Upper Bound to von Neumann Entropy By Recurrence Relation 


We begin with a trivial upper bound for S(a(k)[ij]), which denotes the von Neumann entropy of the reduced density 
matrix of cr(k) on the interval [i. j ]. Since the Wk are isometries, we have 

imod2 = l,jmod2 = 0 —> S(a(k)^ t jj) = S'(r(fc)[ i jj). (5) 

That is, in the case that i is odd and j is even, the interval [i, j] in state er(fc) is obtained by an isometry acting on 
that interval in the state r(k). If * = j , we have the bound 

i = j -A S(a(k)[ij]) < D k . (6) 

In all other cases (if, for example i is even or j is odd or both), the entropy can be bounded above using subadditivity: 


s ( a ( k )[i,j]) < S(a{k)[ m>n ]) + log(U fc )(|?n - i\ + \n - j\ 
for any choice to, n. Combining Eqs. (|5|7[) gives us the bound 


S(a{k) [xj] ) < min™^ 2 ^ 1 ^(5 (t(/c) k „]) + log(II fc )(|TO - i| + \n- j|)). 


(7) 

( 8 ) 


Although in fact this equation holds for any choices of to, n , for all applications we will restrict to to, n such that 
|to — i\ < 1, |n — j | < 1. 
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We emphasize that in the above equation, and from now on, all differences, such as m — i, are taken modulo the 
number of sites at the given level of the MERA. When we compute a difference such as m — i, by |m — i| we mean 
the integer k with minimum \k\. such that m — i = k modulo the number of sites. Similarly, if we write, for two sites, 
i,j that i = j + 1, we again mean modulo the number of sites at the given level. 

Of course, if we have the empty interval, which we write as [i,j] for i = j + 1 mod2 fc , then the entropy is equal to 0. 

Similarly we have 


5(r(fc) [ij] ) < min™“° s d t 2 | ^ i “ 0 1 ^_^ 1 (s(a(k - l) [m/2 ,(n-i)/ 2 ]) + log(£>' fc )(|m - i\ + \n- j |)). 
We will only use Eq. © with \m — i\ < 1, |n — j | < 1. 


(9) 


Expectation Value of Renyi Entropy 


We now obtain a recurrence relation for the expectation value of the Renyi entropy 5*2, defined by .S 2 (p) = 
— log(tr(p 2 )). The analogue of Eq. ([ 5 ]) still holds for S 2 : 

i mod 2 = l,j mod 2 = 0 ->■ S 2 {a(k)[ id ]) = S 2 {T(k)[ iJ] ), (10) 


as does 


S 2 (cr(fc)[jj]) < min 


m mod 2=1,n mod 2=0 
m,ns.t. |ra—z|<l,|n—j|<1 


(s 2 {T{k)[ mtn ]) + log(£> fe )(|m - *| + \n - 



( 11 ) 


and 


S 2 {r{k)[i^) < min 


m mod 2=0 ,n mod 2=1 
m,ns.t. \m—i\<l,\n—j\<.l 


(■ S 2 (a{k - 1)[m/ 2 ,(n— 1 )/ 2 ]) + log(£>fc)(|m -i\ + \n- j|)). 


( 12 ) 


We refer to Eqs. |II|l2t as the reduction equations. These equations make sense also in the case that we have m > n. 
This can occur, for example, if j = i or j = i + 1 in which case the equations allow us to bound S 2 (a(k)uj 1 ) < 
\og(D k )\j - i + 1| and 1 S , 2 (r(fc) [i)i] ) < \og(D' k )\j - i + 1| 

We now show that the upper bound given by repeatedly applying these equations to obtain the optimum result (i.e., 
the result which minimizes the S^) is tight for the expectation value of S^, up to some corrections proportional to a 
certain level in the MERA. That is, we give a lower bound on the expectation value of S 2 . Consider first S 2 (a(k)[ i j]). 
Assume first that i ^ j and imod2 = 0, j mod 2 = 0 (we discuss the other cases later; they will be very analogous to 
this case). We write the Hilbert space of the system of sites 0, ...,2 fe — 1 as a tensor product of four Hilbert spaces. 
These will be labelled Ai,A 2 ,B, R where A\ is the Hilbert space on site i — 1, A 2 is the Hilbert space site i , B is 
the Hilbert space on sites i + 1,..., j, and Ii is the Hilbert space on all other sites. In this case, the Hilbert space 
on a set of sites refers to the case in which there is a Dfc-dimensional Hilbert space on each site. In this notation, 
S 2 (a(k)[ij j) = S 2 (a(k)A 2 B)- The isometry W k is a product of isometries on pairs of sites. Write W k = WX where 
W is the isometry acting on the pair of sites i — l,i and X is the product of all other isometries. The isometry W k 
maps from a system of 2 k sites to a system of 2 k sites, but it changes the Hilbert space dimension from D' k to D k . 
We introduce different notation to write the Hilbert space of the system with a H(,-dimensional space on each site. 
We write it is a product of spaces a, b , r, where a is the Hilbert space for sites * — 1, *, b is the Hilbert space on sites 
i + 1, and r is the Hilbert space on all other sites. Then, 


S 2 (a(k)) A2B ) = S 2 (tr Air (WT(k)W^y 


(13) 


That is, we wish to compute the entanglement entropy of W(f>{k ) on A- 2 b. The isometry W is from a to A\ ® A 2 . 

Note that since the logarithm is a concave function and so the negative of the logarithm is a convex function, we 
have 


E[s 2 (tr Air (\Wr(k)W^ 


= -E 


w 


log tr ( [tr Al r {Wt (fc) W f )] : 


> -logE[tr([tr Air (Wr(fc)W' t )]' 


w 


J w 


(14) 


where E[.. ,]w denotes the average over W. The trace tr^[tr J 4 ir (IEr(fc)W*')] 2 j is a second-order polynomial in W 
and second-order polynomial in the complex conjugate of W. For an arbitrary isometry W from a Hilbert space of 
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FIG. 2: Identity for average of W. Left-hand side represen ts ex pectation value of a product of two powers of W and two powers 
of W. Right-hand side pictorially shows the result of Eq. (151, where the arcs represents Kronecker <5. 


dimension d\ to a Hilbert space of dimension d 2 (in this case, di = ( D' k ) 2 and d 2 = ( D k ) 2 since W is an isometry 
from pairs of sites to pairs of sites; note also that d 2 > d\ since this is an isometry), we can average this trace over 
choices of W using the identity for the matrix elements of W and W: 


where 


and 


EiWijWklWabWcdw 

— C^&ik&jlfiac&bd, "b $ic$jd^ka$lb^ 

“be ^SikSjd&ac&lb “b fiic^jl^kafibd'j ■ 

d\ ■ (d 2 d 2 + d\d 2 ) — d\ ■ {d\d 2 + d±d 2 ) 
(d\d\ + did 2 ) 2 — (d 2 d 2 + d\d \) 2 

j _ di ■ ( d\dl + did 2 ) - d\ ■ {d\d 2 + did%) 
(d\d\ + did 2 ) 2 — (d\d 2 + did 2 ) 2 


(15) 


(16) 


(17) 


Eq. (15) is illustrated in Fig. [2] Some of these averages are similar to calculations in Ref. [TTJ 

Note that the right-hand side of Eq. (15) is the most general function that is invariant under unitary rotations 
W —> UWU' for arbitrary unitaries U, U' and invariant under interchange i, j f> a, 6 or/:,! f> c, d. The constants c, c! 
can be fixed by taking traces with 5ikSji5 ac Sbd and 5ikSjdS ac ^ib and computing the expectation value. The trace of 
the right-hand side with SikSjiS ac Sbd is equal to c(d 2 d\ + d\d 2 ) + c'(d 2 d 2 + c?i<i|). One can readily show that the trace 
with the left-hand side is equal to d\, as the trace with 6ikSjiS ac Sbd is independent of the choice of W. The trace of 
the right-hand side with SikSjdSacdib is equal to c' (d\d 2 + d\d 2 ) + c(d 2 d 2 + d\d 2 ), while the trace with the left hand 
side is equal to d\. So, this gives 


dj = c(d(d 2 + did 2 ) + c'(djd 2 + did%), 


(18) 


Solving these gives Eqs. p|i7) . 

If we use Eq. (15) to compute E tr ( \tVA ir {W T {^)^^)\ 
on the right-hanoside of Eq. (15). The result is 


di = c\d\d\ + did 2 ) + c(d\d 2 + d\d\). (19) 

we find a sum of four terms, one for each of the terms 

( 20 ) 


j w 


(c + d)D\ (tr(jtr a 7 .(T(fc))] 2 ) + tr^[tr r (r(fc))] 
Dl ' (tr(jtr ar .(r(fc))] 2 ) + tr([tr r (r(fc))] 2 


Dt + D 2 k 


- • (! - 0{l/D k )) ■ (tr([tr ar (r(fc))] 2 ) + tr([tr r (r(fc))] 2 )) , 


D k 

where the asymptotic 0(...) notation refers to scaling in D k in this equation. 


Note that since both terms on the right-hand side of last line of Eq. (20) are positive, the last line is at most equal 
to twice the maximum term. 
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Note that the terms on the right-hand side are related to Renyi entropy; for example, minus the logarithm of 
tr(jtr aT .(r(fc))] 2 ^ is equal to S 2 (T ar (k)), i.e. the S 2 Renyi entropy of r(fc) on ar, and similarly for r. So, we get: 


E[S 2 (a(k) [itj] )] w > -logj 


m mod 2=1, n mod 2=0 


E exp^-S , 2 (r(fc) [mi „]) - log(D fc )(|m -i\ + \n- j|)J 


( 21 ) 


The reader might note that we have in fact only derived Eq. (22) in the case that j mod 2 = 0. However, by averaging 
over isometries at both left and right ends of the interval, one can handle the case that j mod 2 = 1 identically to 
above. 

Note that also that since all terms in the sum of Eq. (21) are positive, the sum is bounded by a constant times the 
maximum. So, minus the logarithm of the right-hand side is equal to 

log(-Dfe) + ma x(s 2 (T ar (k))) + S 2 (r r )) - 0(1). 

So, using Eq. (fl4]) , we have that 


E[S 2 (a(k) [i}j] )\ w > + log(Z)fc)(|m-i| + |n-j|)) - 0(1). (22) 


Here the 0(1) notation refers to a term bounded by a constant, independent of all dimension Dk,D' k . 
We will also want the result that 


(23) 


that we have been 


E [s x p{-S 2 {<T(k) [i j])^] w < ^2 exp(— S 2 (T(k) [mtn] ) -\og(D k )(\m- i\ + \n- j|)) 

Note that the left-hand side of the above equation is the quantity E tr( [tr A 1 r{WT{k)W^)] 2 ) 

considering and Eq. (21) follows from this by convexity. 

We also give the analogs of Eq. ( |21|22|23| for the entropy of r(fc): 

E[S 2 {T(k) [itj] )] w > -log{sum””° d t 2 |^ 0 l”.“ o 1 d | ^. | < 1 exp(-S , 2 (a(fc-l) [m/2 , (n _ 1)/2] )-log(0( : )(|m-i| + |?r-j|)) }, (24) 


T-irn / / j \ \i \ • m mod 2=0, n mod 2=1 

E[S 2 {T{k) [itj] )}w > mm ro ^ 


,,ns.t. |m—i|<l,|n— j|<1 


(s 2 (cr(k - 1) [m/2,(n—1)/2] ) + logp^^dm -i\ + \n-j\)^ - 0(1), (25) 


-E[exp(-S 2 (r(/c)[jj]))] w < sum 


m mod 2=0 ,n mod 2=1 
m,ns.t. |m—z|<l,|n— j|<1 


q|^_j-|<i exp (—^( ctC/c — l) [m /2,(ri- 1 )/ 2 ]) — lo g(-Dfc_i)(l m — i\ + \n — j|)) . 

(26) 

This now allow us to upper and lower bound the expectation value of S 2 for any interval [i,j]- Given an interval 
[i,j], let a reduction sequence denote a sequence of choices at each level to reduce [i,j] to the empty interval so that 
at each step we apply either Eq. (11) or Eq. (12) until we are left with i > j at which point we are left with the empty 
interval which has entropy 0. That is, such a sequence consists first of a choice m,n with m mod 2 = l,nmod2 = 0, 
followed by a choice m, n with to mod 2 = 0,nmod2 = 1, and so on, with i. j at each step being determined by the 
to, n at the previous step. For such a sequence Q, let S(Q) denote the upper bound to the entropy obtained from the 
reduction equations; in this reduction, once we obtain the empty interval, we use the fact that that has entropy 0. Let 
h(Q) denote the height of a given reduction sequence, namely the number of times we apply the reduction equations 
until we arrive at the empty interval. Note that the height increases by 2 every time we change the level by 1 since 
we apply both equations. 

Then, we have the result that 


and 


S 2 {T[ij]) < min Q S'(Q), 


£ , [exp(-S' 2 (r [lj] ))] < ^ exp(-S(Q)), 
Q 


(27) 


(28) 


as follows by using Eqs. ( 23|26 l to sum exp(— S 2 {...)) over reduction sequences. As a point of notation, from now on 
we use E[...] to denote the average over all W, V in the MERA. 

Finally, we have 











Lemma 1. 


E i S 2( T {i,j])} > min Q S'(Q) - 0{l)h{Q). 


(29) 


Proof. From Eq. (28) and convexity of — log(...), 


E i S 2 (T[i,j])] > log{^exp(-5'(Q))}. 
Q 


(30) 


Write the sum over Q inside the logarithm as a sum over levels, 

^exp(—S'(Q)) = ex P (~ S (Q))- 

Q h Q,h(Q)=h 

Since there are at most sequences of height h (we have at most two choices at each side of the sequence, 

^2 exp {-S(Q)) < ma x Q , h{Q ) =h exp (-S(Q) + ln(4 )h(Q)). 

Q,h(Q)=h 

Now, we use a general identity. Let g(x) be any positive function such that 2 iK 2 -) -1 converges to some 

constant c. Then, for any function f(x ), we have that X)x=i 2 f( x ) — c ' max x=i, 2 ,...f(x)g{x). To verify this 
identity, minimize c- max x= i i 2 r ../(r)(;(r) over positive functions / subject to a constraint on X)f=i 2 f{x); the 

minimum will be attained for f(x ) proportional to l/g(x) and plugging in this choice of f(x) gives the identity. So, 
picking g{x) = (l/2) x , we get that 


(31) 


(32) 


^2 max QMQ)= h exp {~S(Q) + In(4 )h(Q)) 

h 

< max ft maxQ :fl( ,) =ft exp(-5(Q) + ln(8 )h(Q)) 
= maxQ exp(—5(<5) + ln(8 )h(Q)). 


(33) 


Then, Eq. (29) follows, choosing the 0(1) constant to be log(8). 


□ 


We remark (we will not need this for this paper) that for some choices of Dk,D ' k , the sum in Eq. (28) will be 


dominated by a single reduction sequence. In that event, it will be possible to tighten Eq. (29) by improving on the 
term — 0(l)h(Q) on the right-hand side. 

Further, we also have 


Lemma 2. The following inequalities for the von Neumann entropy hold: 

S(r[i,j]) < mm Q S{Q), 


(34) 


E [ S ( T [i,j])] ^ min qS(Q) - 0(l)h(Q). 


(35) 


Proof. Eq. (34) holds by the reduction equations (8j9) for S, and Eq. (35) follows by lemma [Tj since S is greater than 

S 2 . ^ □ 


CHOICE OF D k ,D[ 


We now give the choice of Dk,D' k . At the bottom of the MERA state, the leaves have dimension Dl chosen to be 
any fixed value greater than 1. For example, we may take = 2. Then, we would follow the recursion relations: 

\og(D' k )*\og(D k )-e2 L - k , (36) 

for all k and for k < L 

\og(D k )*2\og(D' k+1 )~e2 L - k . (37) 

The value e here will be related to the e in the Brandao-Horodecki paper and to the mutual information that we find 
between intervals. The factor of 2 L ~ k represents the length scale associated to a given level in the MERA state: there 
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are roughly 2 L ~ k leaves of the MERA in the future light-cone of a given node at level k. Here, the “future light-cone” 
refers to the leaves such that there is a path in the MERA starting at the given node at level k and moving downward, 
ending at the given leaf. The usual terminology in MERA instead refers to a causal cone of operators being mapped 
upwards to higher levels of the MERA; we discuss this later. 

We write the approximation symbol « rather than the equals symbol = because the dimensions D kl D' k should be 
integers. So, the recursion relations we use to obtain integer dimensions are 

D' k = |"exp{log(Dfc) - e2 L ~ k }] , (38) 


D k = Cexp{21og(£>fc +1 ) - e2 L *}]. 


(39) 


We choose e,L so that Dq = 1. This can be done taking L ~ 1/e so that the total number of sites in the system is 
equal to exp(0(l/e)). The calculation is essentially that in Ref. [SJand Ref. 0 where both papers used a recursion 
relation for the entropy. Let us study the recursion relations ignoring the complications of the ceiling; that is, we 
treat Eqs. ( 36|37[ ) as if they were exact. The ceiling in the correct recursion relations has negligible effect on the 
scaling behavior. We have D L given. Then, log(Di_i) = 2log (D L ) — 3e. Then, log (D L _ 2 ) = 41og(D L ) — 12e and 
log(T>L_ 3 ) = 81og(D L ) - 36e. In general, 


log(D L _ m ) « 2 m \og(D L ) - 3me2( m - 1 ). 


(40) 


This remains positive until m ~ 1/e; so, as claimed, we can take L ~ 1/e. Note also that for all m < L — 1, 

log (D L _ m ) > 2 m e. (41) 

We say that for all m < L — 1 because log(-Di) must be positive, so \og(D 2 ) must be at least e2 i-2 ; for many choices 
of e, L, as similar inequality will hold even for m = L — 1 and we will always choose e, L such that this holds. 


Entanglement Entropy For This Choice 

We now estimate the entanglement entropy for this choice of D k , D' k for an interval [i. j]. We make a remark on the 
Big-0 notation that we use. When we say in lemma[3]and lemma[4]that a quantity is fl(a:), we mean that it is lower 
bounded by C\X — C 2 log(Z) — c 3 for some positive constants Ci,C 2 ,c 3 which do not depend on Dk,e. We emphasize 
this because otherwise one might worry about subleading terms hidden in the Big-0 notation: since the leading term 
often involves a factor of e (at least in lemma [4]), a quantity such as el becomes large only once l becomes large enough 
and so one might worry about the simultaneously limits of large l and small e. The notation 0(1) continues to refer 
to a quantity bounded by a constant, independent of e,Dk- 

Lemma 3. The expected entanglement entropy of an interval [i,j] with length l = j — i + 1 with l ^ N/2 is lower 
bounded by 


ElS^j])} > fi(log(£> L _ log2(/) )). 


(42) 


Proof. We estimate mingS/Q) — 0(T)h(Q) and apply lemma [2j For any choice of [i,j], for any sequence Q, each 
time we apply Eq. (11) or Eq. |l2|, it is possible that we produce a positive term, log(O fc )(|m — i\ + \n — j" 


or 


log(D k _ 1 )(\m — i\ + \n — j|), respectively. Let us say that if | m — i\ = 1 then the term is applied at the “left end” of 
the interval, while if \n — j\ = 1 , then the term is applied at the right end of the interval (as the interval changes as we 
change level in the MERA by applying Eqs. we continue to define the left end and right end in the natural 

way). 

One may verify that at least every other time we apply the equations, we must produce a positive term at the left 
end and at least every other time we apply the equations, we must produce a positive term at the right end. That 


is, if Eq. (11) does not produce a positive term at the left (or right) end, then Eq. (12) must produce a positive term 
at the left (or right, respectively) end. The only exception to this is if the interval becomes sufficiently long that it 
includes all sites at the given level of the MERA; this does not happen for the intervals considered here. So, 


S(Q) > 2 (D k + D k _i + Dj ^-2 + ... + Dk~ [/i(q)/2J )• 


(43) 


We now estimate the minimum h(Q). Every time we apply Eq. (11) and then Eq. (12), an interval of length l turns 


into an interval of length at least 1 /2 — 2. The factor of —2 occurs because Eq. (11) can reduce the length by at most 2; 
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then Eq. (12) can further reduce the length by at most 2 more, and then divide the length by 2. Thus, 2 applications 
of this pair of equations can map an interval of length l to one of length (//2 — 2)/2 — 2 = 1/2 — 3, and k applications 
can map an interval of length l to one of length l/2 k — 4. Once the length becomes four or smaller, than the length 
can be mapped to zero by a pair of applications. Thus, h(Q) is greater than or equal to 2k with l/2 k — 4 < 4, so 
l/2 k < 8 (in fact this estimate is not quite tight, as if the length of the interval is nonzero after k applications, then 
h(Q ) > 2k). So, 


h(Q) > 21og 2 (Z) — 0(1). 


(44) 


Combining Eqs. (43|44) gives Eq. (42). Here we use Eq. (40) to estimate D L _ h (q) and note that 
log(Z?L_i 0 g 2 (z) + o(i)) > n(iog(Di_i og2 (q)). Further, we use the fact that the term — 0(l)h(Q) in S(Q) — 0(\)h{Q) is 
asymptotically negligible compared to S{Q). O 


Mutual Information 

We now estimate the mutual information between a pair of neighboring intervals, each of length l. We lower bound 
this by el. This implies a similar lower bound on the mutual information between a single interval [i. j] of length 21 
and its two neighboring intervals of length l. 

Lemma 4. The expected mutual information between two neighboring intervals [i,j] and \j',k], with j 1 = j + 1 and 
l = j — i + 1 = k — j is lower bounded by 


E[I{[i,j\\\j',k\)] > H(eZ). 


(45) 


Proof. Call [i,j] the “left interval” and call \j’, k] the “right interval”. Let Ql, Qr be reduction sequences for [i,j] and 
\j',k], respectively, which minimize S(Q) - 0(l)h(Q). So, E[S(t [*.,•]) + S(Ty/ fe] )] > S(Q L ) + S(Qr) - 0(l)h(Q L ) - 
0(l)KQ R ). 

We now show that S^t^*,]) < S{Ql) + S(Qr) — kl(e)l. Note that always the optimum reduction sequences have 
Zi(Ql), h(Qfi) < const, x log(Z)). So, this upper bound on S(Q) will imply Eq. (45). This bound will be based 
on constructing a reduction sequence for [z, Z]; however, we will in one case need to also use subadditivity and then 
construct further reduction sequences. That is, it will not simply be a matter of applying Eqs. flnpt with a given 
sequence but a more general reduction procedure will be needed. 

Let i be the left end of interval [i,j] and j be the right end. Refer to Fig. [3j The reduction sequence Ql describes 
how both the left and right end of the left interval move as we change levels in the MERA. Let io,*i,* 2 , ■■■Ph(Q L ) 
the sequence describing where the left end is after each application and let jo, ■ ■-,jh(Q L ) describe where the right end 
is. That is, after k applications of Eqs. flnpl ), we have a new interval [A, jfc]. Eventually, after h(Qif) applications 
of the reduction equation,the interval has length zero so that ih(Q L ) = jh(Q L ) + 1- Similarly, let j' 0 , ..., j'^Q^ and 
ko, ..., kh(Q R ) be the left and right ends of the right interval. 

Let Sl{Ql ) denote the sum of the quantities log(I\.)|m — i\ or log(Z?^._ 1 )|m — i\ obtained using Eqs. ( [ITjT2j ) for 
reduction sequence Ql , while let Sr{Ql) denote the sum of \og(Dk)\n — j\ or log(D(,__ 1 )|n — j\. That is, these are 
the sum of the terms at the left or right ends of the interval, so that S{Ql) = Sl{Ql ) + Sr{Ql )• Define Sl{Qr ) and 
Sr(Qr) similarly so S{Qr) = S l (Qr) + Sr(Qr). 

Suppose first that i a = k a for some given a, i.e. , the left end oi Ql meets the right end of Qr. Then, define a 
reduction sequence Q by taking the sequence io, ..., i a for the left end of Q and ko, ..., k a for the right end of Q. Then, 
S'(r[ i; ;]) < S{Q) < S l (Ql) + Sr(Q r ) = S(Q L ) + S(Qr) - Sr(Q l ) - S l {Qr )■ However, referring to the calculation 
in lemma[3j Sr(Ql) > H(eZ) as is Sl(Qr), which gives the desired result. 

So, let us assume that i a ^ k a for all a. Suppose, without loss of generality that Zi(Ql) > Zi(Q_r). In fact, it may 
not be possible that /i(Ql) will ever differ from Zi(Q^j) for the optimal sequences Ql , Qr for the given pair of intervals 
and for the given choice of dimensions in the network so it might suffice to always assume that Zi(Ql) = h{Qif), but 
we are able to lower bound the mutual information even in this possibly hypothetical case (it is possible for /i(Ql) to 
differ from /i(Q_r) if the intervals have different length). 

To simplify notation, let h = h(QR). Define Bl(Ql) to be the sum over the first h applications of Eqs. ( fnjl2| 
in reduction sequence Ql of log(Dfe)|m — i\ or log(D^,_ 1 )|TO — i\, while let Br{Ql) denote the sum over the first h 
applications of log(Dfc)|n — j \ or \og(D’ k _ 1 )\n — j |. The notation Bl or Br is intended to indicate that these are the 
contributions to Sl or Sr arising from the first h applications, i.e., at the “bottom” of the MERA. 
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FIG. 3: Illustration of part of a MERA network. Only a fragment of the network is shown, so that the three lines leaving 
upwards connect to other parts of the network, as do the two lines leaving downwards. We illustrate computing mutual 
information between two intervals, each of three sites. The left interval is represented by the unfilled circles on the leaves of 
the MERA, while the right interval is represented by the circles with diagonal lines. Unfilled circles with thin outer lines and 
circles with diagonal lines at higher levels of the MERA represent the intervals that result from applying Eqs. ( fTTpt for the 
optimum sequences Ql,Qr , respectively. Both sequences have h = 2. When computing the optimum reduction of the 6-site 
interval containing both of these 3-site intervals, the resulting intervals contain the unfilled circles with thin outer lines, and 
the circles with diagonal lines and also the unfilled circles with thicker outer lines. Filled circles indicate sites not in any of 
these reduction sequences. The squiggly lines crossing the lines of the MERA network represent contributions to Sr(Ql) and 
Sl(Qr), while the dashed squiggly line crossing the line at the top represents an extra term in the entropy to reduce the 6-site 
interval. The difference between these is equal to the expectation value of the mutual information, up to subleading terms. 


Consider applying Eqs. ( fnpl ) a total of Qr times, using the sequence io,...,ih for the left end and k 0 ,...,kh for 
the right end. Note that this sequence of reductions may not end at the empty interval; rather, it leaves the interval 
[ih,kh\- This gives 


where A is either 


if h is even or 


S{r[ i} k]) < B l (Q l ) + S r (Q r ) + A, 
A = S(t(L- h/2) [ihM] ) 


(46) 

(47) 


A = S(a(L-(h-l)/2) [ihM] ) (48) 

if h is odd. That is, A is the entropy of the interval that remains after applying the reduction sequence. 

Now use subadditivity. To simplify notation, let us suppose that h is even (we simply do this so that we can write 
t(...) everywhere, rather than having to specify either r(...) or cr(...) in each case). Then, 


A < S{t(L - h/2) [ihJh] ) + S(t{L - h/2) [jh+1M] ). (49) 

Note that kh = j' h — 1 since the right interval vanishes after h applications of the reduction equations; this makes the 
interval [jh + 1, kh) look more symmetric in left and right. Note also that if h{Qij) = h(Q R ), then j h + 1 = ih- 

We can then upper bound S(t(L — h/2)u h j h i) using a reduction sequence with left end ih , ...,ih(Q L ) and right end 
jh,-,jh(Q L ), giving 


S{t(L - h/2)[ ihi j h y) < Sl{Ql) - B l (Q l ) + Sr(Ql) - B R (Q L ). 


So, from Eqs. ( 46|49 ), 


S{T[i,k]) < Sl(Ql) + ( Sr(Ql ) - Br(Q l )^ + Sr(Qr) + S(t{L - h/2 )[ Jh+ljfefe ]). 


However, 


(50) 


(51) 


S(t(L - h/2) [jh+1M] ) < B r (Q l ) + S l (Q r ) - fi(d), (52) 

which gives the desired bound on the mutual information. To see this, estimate S(t(L — h/ 2) [y^+i,fc^]) using another 
reduction sequence. In Fig. [3j the interval [jh + 1, kh) consists of the two sites with open circles on the row two rows 
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above the bottom (i.e., the bottom row of the level one level above the bottom). Then entropy S(t(L — h/2)u h +\,k h \) 
is less than or equal to (kh — jh) * 1 og(D L _ h / 2 ). However, B r (Ql) + Sl(Qr) is greater than or equal to (kh — 
jh) * ^og(D L _ h / 2 ) as can be seen in the figure |3j that is, the entropy of the two sites [jh + 1 ,kh] is greater than 
or equal to the sum of logarithms of dimensions of bonds cut by squiggly lines. If kh — jh is sufficiently large, 
then in fact S(t(L — h/2)[j h+lkh ,) < (kh — jh) * ^og(D L _ h / 2 ) — kl(el); this simply requires that kh — jh be large 
enough that at least one pair of sites in the interval [jh + 1, kh] emerge from same isometry as occurs in the figure. 
Alternately, if k h = jh + 1, then S(t(L - h/2)y h+1>kh] ) < \og(D L _ h ), while B R (Q L ) + S L (Q R ) > 2 log (D L _ h )- If 
kh < jh, then S(t(L — h/2)^j h+ i kh \) = 0. The remaining case is the kh = jh + 2 but that the two sites do not 
emerge from the same isometry. However, in this case S(t(L — h/2)y h+lkh ]) < 2log (D L _ h / 2 ). However, by the 
same calculation in lemma[3lthat gave Eq. (43), we find that B R (Q L ) > D L + D L _ i + D L _ 2 + ... + D L _ h / 2 and 
Sl(Qr) > D r + D r ~ i + D~l- 2 + ... + D L _ h / 2, so B r (Ql) + Sl(Qr) — S(t(L — h/2)[j h+l kh ^) > 2D L _ h / 2 + 1 + ■ ■ • 
which is fl(el). □ 


CORRELATION DECAY 

We now discuss decay of correlations in this state. We do not prove correlation decay. However, we conjecture 
that for the MERA state above, for any two regions X, Y separated by distance l , we have Cor(X : Y ) < C2~ 1 ^ for 
some C = 0(1) and some £ bounded by 0(l)/e with probability that tends to 1 as D L _i og2 ^ tends to infinity. In the 
discussion, we briefly discuss controlling rare events. 

The simplest version of this correlation decay to consider is when X consists of a single site and Y is separated 
from A' by at least 1 site. Thus, the site in X consists of one of the two sites which is in the output of some given 
isometry W , and Y does not contain the other site which is in the output of that isometry. Let us divide the system 
into three subsystems. Let B = X. Let E be the other site which is in the output of the same isometry as A, and let 
A consist of the rest of the system. (We rename X as B to make the notation more suggestive of a quantum channel 
from Alice to Bob, as we will use ideas from quantum channels.) Since Y C A 1 it suffices to consider correlation 
functions (ip\0 aOb\^) for Oa,Or supported on A,B respectively. 

Consider the two subsystems A and BE , and make a singular value decomposition of the wavefunction i/j, so that 
we write 

4> = ^A(a)|a)A ® \oi)be, (53) 

OL 

where [a )a and | a )be are complete bases of states on A and BE, respectively, and A(a) are complex scalars with 
J2 a |^4(ck) | 2 = 1. Let Oa have matrix elements(O y 4 )/ 3 ,a in this basis. Then, 

(ipi\0 A 0 B \ip) (54) 

= ^2 A(P)A(a) {^a{P\0a\oc)[be(P\Ob\oc)be^ 

a,(3 

= tr (d A 0 B ), 

where Oa is defined by its matrix elements 

(Oa)/? a = ( 0 A )<xpA(l3)A(a ). (55) 

To estimate the correlation decay, we must maximize the correlation function over Oa,Oc with ||Oa||, ||Oc|| < 1- 
Since the maximization over operators with bounded infinity norm may not be easy, we instead derive a bound in 
terms of a maximization over operators with bounded i 2 norm (which we write | ... I 2 ) for which the maximization 
reduces to a problem in linear algebra. We have 

|Oa| 2 = 


< 

< 


\/ tr (0i) 


/Ei^)i 2 i A (“)i 2 i(°^ 

/3 a. 


(max a |A(a)| 2 ) • \O a \z 
fmax ct |A(a)| 2 'j i/dl, 


(56) 
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where the last line follows since ||Oa|| < 1. 

Let A, B , E have Hilbert space dimensions cIa, dg, Ae respectively (in our particular case, we have dg = dg = Dl)- 
In fact, while the Hilbert space dimension of A diverges with system size, since the rank of the density matrix on BE 
is at most (-D^^) 2 , we can assume dA = (D' L _ 1 ) 2 . It is not hard to show that max Q |A(a)| 2 is approximately equal 
to l/d ,4 times a constant with high probability (i.e., with probability that tends to 1 as dA tends to infinity). To see 
this, note that the |A(a)| 2 are the eigenvalues of the reduced density matrix on the two sites entering the isometry 
W. Each of those two sites is the output of some isometries; call those isometries V, V' . For random choices of V, V ’, 
for arbitrary input state to V <g> V', indeed the output state on the given two sites will have all the eigenvalues close 
to 1/dA- So, | Oa 1 2 is bounded by a constant times 1 /y/dA, with high probability. At the same time \On \2 is bounded 
by y/ds if we define the i 2 norm usin g th e trace on B , rather than the trace on BE. 

We can in fact tighten the bound (56) on 10^41 2 , if desired. Let p be the diagonal matrix with entries |A(a)| 2 . 
We have \Oa\ 2 = tr( OapOaP )• Note that tr( OapOaP ) < £t:{OaPP^ 0\) = t v{0\0aP 2 )- For ||Oa|| < 1, we have 
tr{0\0AP 2 ) < tr(p 2 ). So, \0a\2 < i/tr (p 2 ). Note that this is equal to the exponential of minus one-half the S 2 
entropy of p. 

Define a super-operator £(...) by 


£{0) = \/^tr g(W'OVF t ). 

V dA 


(57) 


This super-operator is a quantum channel multiplied by the scalar y/(jj. Then, 

Cor(X : Y) < const, x maxo A ^ |2 < 1 max OBj | OB | 2 < 1 (tr(Og£(d^)) - tr(O s f (y / dAjo))tr(0 A ) 


(58) 


where we have rescaled Oa, Ob to have t 2 norm equal to 1 , absorbing factors of 1 /y/dA and y/cTs into £{...), and 
where the constant is present because the bound (before re-scaling) is that \Oa\i is bounded by a constant times 
1/y/dA, with high probability. 

We now consider the super-operator £(...). We now consider the case of general d^, dg, dE- The state £{p), which 
is the output state of this super-operator for the density matrix as input, may not itself be exactly maximally mixed. 
However, it is very close to maximally mixed with high probability if ds « dAdE and if p is close to maximally 
mixed. Further, for any traceless operator O, we have tr(£(0)) = 0. Hence, the maximally mixed state is very 
close to a right-singular vector of £ if ds « dAdE and there is a singular value of £ very close to 1. So, the term 
—tr(Og£ (y/dAP))^(d a)^ is close to projecting out the largest singular vector of £(...). 

So, the important quantity for correlations is the magnitude of the second largest singular vector. Indeed, what we 
would like to have is that £{■■■) is a non-Hermitian expander (non-Hermitian in that £{■■■) is not Hermitian viewed 
as a linear super-operator), meaning that it has one singular value close to 1 and all others separated from 1 by a 
gap. Calculating the singular values of £{■■■) is likely similar to the calculation in Ref. [T2J with some additional 
complications because we are interested in a very different choice of dimensions. For one thing, dE and ds are 
comparable here rather than having dE « ds- For another thing, dA 7 ^ ds, so the super-operator £{■■■) has a 
multiplicative prefactor y/ds/dA compared to a quantum channel. 

We leave a proof that it is an expander for a future paper. However, we give some numerical and analytical evidence. 
Let x = ds/{dAdE) and y = dA/{dsdE)- We conjecture that £{■■■) is an expander if x,y « 1. More precisely, 
what we conjecture is that for a random choice of W with high probability the difference between the largest singular 
value and 1 is bounded by some polynomial in x, y and also that the second largest singular value is bounded by some 
polynomial in x, y. Note that certainly we do not expect to get an expander if y « 1. If y = 1, then all singular values 
are equal to 1 . 

We can estimate the average over W of the sum of squares of the singular values of £{...) using the same techniques 
as we used to estimated E[exp(—■ -))]w previously, as this sum of squares is also a second order polynomial in 
W and in W. For dg = ds and dA « dgdE, one finds that this sum of squares is equal to dA up to subleading 
corrections. The number of non-zero singular values is equal to dg in this case, so that if all singular values (with the 
exception of the largest) have roughly the same magnitude, then this magnitude is roughly 

y/dl/d^ = yfV- (59) 

We have numerically investigated the properties of this super-operator. First, we observe that qualitatively that 
there indeed is a gap once x, y « 1. In Fig. [4j we show an example with d^ = 80, dg = dg = 10. Even in this case, 
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FIG. 4: Singular values of £(...) for a random choice of W for aU = 80, ds = d,E = 10. Singular values A (i) 
are plotted in descending order. Largest singular value is equal to 1.0067..., while next largest singular values are 
0.9596..., 0.9592..., 0.9587........ 


where y = 0.8 which is not that small, we observe a distinct gap between the first singular value and the rest. We 
plot the singular values A(z) in descending order as a function of i from i = 0 ,..., d B — 1. 

Next, to test the scaling of the singular values, we first consider the particular case that cZa = ds = ds- This is 
not the relevant case of interest for the MERA state constructed, however it is still interesting as a way to test the 
scaling. In this case, we have \/dA/d 2 B = 1/ VdB■ What we find is that indeed scaling holds. We are able to construct 
a scaling collapse, plotting the singular values A (i) from i = 1 ,..., d B — 1 in descending order, i.e., not including the 
leading singular value. In this plot we plot A (i) * \/ds as a function of i/d 2 B . As shown in Fig. |HJ we are able in this 
way to almost perfectly collapse curves for different choices of ds- Further, the collapse holds even for the leading 
singular values; that is, we have observed that the second largest singular value scales as 1 l\[dB- So, in this case, we 
have strong numerical evidence for the polynomial decay as a function of x, y. 

Before considering the case of interest to us, let us explain why we are interested in having a polynomial decay of 
the second largest singular value as a function of x, y. This is due to our desire for exponential decay of correlations 
at all length scales, not just for a single A' with Y separated from A by one site. The MERA states used to describe a 
conformal field theory at criticality display a power law decay of correlation functions as a function of distance|13j. To 
understand this polynomial decay, consider a correlation of two operators Oj, Oj supported on single sites, i. j. Then 
one can iteratively map an operator such as Oj or Oj into an operator at higher levels of the MERA. This map is a 
linear map; it is in fact related to the adjoint of a super-operator such as £(...) that we consider; the map in Ref. [T3] 
is regarded as moving operators up to higher levels of the MERA rather than, as we have described it, moving states 
to lower levels of the MERA. To move up one level in the MERA, one must apply two super-operators, as each level 
in the MERA corresponds to two isometries V, W. This linear map leads to an exponential decay of the difference 
between the operator and the identity operator as a function of level in the usual MERA states; since the number 
of levels between i,j is logarithmic in i,j, this leads to a polynomial decay. The reason for the exponential decay is 
that in such MERAs, the isometries are taken in a scale-invariant fashion, so that they are the same at all levels (or 
all except the bottom few levels) and so the super-operator has a fixed gap to the second largest singular value at all 
levels. In our MERA, however, the isometries change with level. Thus, we hope that the decay when moving from 
one level to the next will be polynomial in x,y. Since y ~ exp(— e2 L ~ k ) for isometries in Wk , a polynomial decay in 
the smallest y (which occurs at the highest level, giving a y which is exponentially small in the spacing between sites) 
will lead to an exponential decay in i — j. 

One complication in this is that when we map an operator on a single site i of the MERA to higher levels of the 
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FIG. 5: Singular values of £(...) for two random choices of W, one with cIa = ds = de = d = 10 (shown in blue) and the other 
with <1a = ds = cIe = d = 20 (shown in green), y-axis shows \/d\(i), while x-axis shows i/d 2 . The largest singular value for 
each super-operator is not plotted. The two largest singular values for the first super-operator are equal to 1.05 ..., 0.645 ..., 
while the two largest second values for the second super-operator are equal to 1.025..., 0.456.... 


MERA, result is no longer an operator supported on a single site. However, the so-called causal cone of such an 
operator (i.e., the support of the operator after it is mapped to higher levels of the MERA; this support is the same 
as the set of sites which have i in their light-cone as we have defined the light-cone) does not consist of a single site 
at each level. Rather, the causal cone consists of some small number of sites[T3], depending upon the exact MERA 
chosen. However, it seems likely that, since we are considering an £2 norm, if we can show a gap in the singular values 
of the super-operator corresponding to the map of a single site operator upwards by one level of the MERA, it will 
also be possible to show a gap in the map of an operator supported on some small number of sites, as the £2 norm does 
have the nice property that the singular values of a product of super-operators can be determined from the singular 
values of the individual super-operators. If we instead worked with £^ norms, there would be difficult multiplicativity 
questions that would arise and perhaps having a bound in the £oo —» t'oo norm of a pair of super-operators would 
not help in bounding the £oo —» norm of the product. In this way, we conjecture that it will be possible to show 
at least an exponential decay of Cor{X : Y) for distances sufficiently large compared to the diameter of X and the 
diameter of Y. 

A more difficult question is whether we can show an exponential decay even if the diameters of X, Y are large 
compared to the distance between A', Y. We conjecture that this will also hold. We take an operator O x and apply 
the super-operator £(...)! to map Ox upward in the MERA and similarly map Oy and apply this process repeatedly 
until A, y meet. The intuitive idea is that at every step of this process we consider the site i at the leftmost edge of 
X and we decompose the operator O x on A' into a sum of two terms, O x + where O x is the identity operator 
on i tensored with some other operator on the rest of A', while O x vanishes after tracing over i. The site i is one of 
two sites output from some given isometry. Assume that the other site output from that isometry is to the left of i 
so that it is not in A'; in this case we say that “a site is traced over at the left end”. Note that it is not necessary 
that a site be traced over on a given step; for example, if A' consists of two sites which are output from the same 
isometry, then no site is traced over. However, if a site is not traced over on a given step, then a site must be traced 
over at the next step. So, suppose that a site is traced over. Let £{■■■) be the super-operator associated with this 
isometry and tracing over the site i — 1. We would then use the bound on singular values of the super-operator £{■■■) 
to show that the £2 norm of O x decays by an amount exp(—const, x e2 L ~ k ) after applying the super-operator £{...)! 
to map it to an operator higher in the MERA, while O x maps to an operator with increased separation between X 
and Y. In this manner, we conjecture that at some level k , with k ~ log(Z), we must have a decay in £2 norm by 
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i/d B 

FIG. 6 : Singular values of £(...) for three random choices of W, one with d,A = 50, ds = d,E = d = 10 (shown in blue), 
dA = 128, ds = dE = 16 (shown in green), and dA = 200 , ds = ds = 20 (shown in red), y -axis shows A (i), while x-axis shows 


exp(—const, x e2 L ~ k ). 

When we turn to the case of interest to us, with dA » ds but y « 1, we do not find a clear scaling collapse. In 
this case, since x « y, we might hope that the scaling collapse would hold with two different super-operators with 
the same y. In Fig. [6j we see that this is not the case for three different super-operators with y = 1/2 for both and 
ds = Ae = 10,16,20. It is possible, however, that for large enough ds at fixed y the singular values will eventually 
collapse on each other; the curves are becoming flatter with increasing ds suggesting that this may happen. If such a 
collapse happens for large ds for the entire curve, then indeed the second largest singular value must be proportional 
to yd/ for large ds- Even if there are corrections to this which vanish polynomially in ds, this would still suffice. 
Some evidence for a collapse is shown in Fig. [7] Here we show an attempt to collapse the three curves by rescaling 
(A(i) — const. )d%, where the constant 0.706... is chosen to match the approximate crossing point of the curves and 
a = 2/3 was chosen after some experimentation. Good collapse is seen between the curves with ds = 16,20, while 
the curve with ds = 10 does not collapse as well, especially for large i. 


DISCUSSION 

While this work was in progress, another work constructed a MERA state for which the entanglement entropy 
was exactly given by the minimum length of curves cutting through the MERA network [14] . There are two main 
differences in the type of states constructed. First, we used random tensors, instead of the perfect tensors used there. 
Second, we considered a very different set of choices of dimensions at different levels of the MERA, in our goal of 
constructing a state with high entanglement and low correlations. These two choices may have some interpretation 
in the language of holography and quantum gravity as follows. The different choice of dimensions may correspond to 
some different choice of geometry in the bulk space, rather than an AdS geometry. 

The choice of random tensors, however, might be interpretable in terms of quantum fluctuations in the bulk 
geometry: instead of the entanglement entropy being exactly expressed in terms of a single curve cutting through 
the MERA, the optimum reduction sequence (note that each reduction sequence corresponds uniquely to a curve) 
gives only upper and lower bounds on the expected entanglement entropy, with a possible logarithmic difference 
between those results. However, the expected exponential of minus the S 2 Renyi entropy, £'[exp (—62 (• • ■))], can be 
exactly expressed as a sum over reduction sequences (or curves). This difference between a minimization and a sum is 
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FIG. 7: Re-scaled singular values for same three channels as in Fig. [6] for i = 1,..., d% — 1. 


reminiscent of the difference between classical and quantum mechanics (least action path compared to path integral). 
If the dimensions Dk,D' k become large (and importantly also the differences between certain sums of the Dk,D' k 
become large), then the sum becomes dominated by a single curve. This is perhaps reminiscent of the fact that 
certain random matrix theories can be interpreted as a sum over random surfaces, with the limit of large matrix size 
in the random matrix theory involving a sum only over a single genus; our theory is a more general kind of random 
matrix theory, but perhaps something similar happens. 

Finally, the reader might note that we only prove results about the expectation value of the entanglement entropy, 
rather than proving results about the entanglement entropy for a specific choice of isometries in the MERA. For 
example, lemma [3] only lower bounds the expectation value of the entanglement entropy for intervals of length l. The 
reader might wonder: is there a specific choice of isometries for which for all intervals of length l , the entanglement 
entropy is within some constant factor of its expectation value? In this paper, we did not worry at all about trying 
to prove such results. However, we briefly mention some possible ways to try to do this. One might, for example, try 
to use concentration of measure arguments to estimate fluctuations about the average. This could perhaps show that 
the probability of a “bad event”, such as low entanglement entropy (or perhaps long correlation length, if indeed it is 
true that the state is short-range correlated as we conjecture), is exponentially small in dimension. This approach has 
the downside that the system size is exponentially large in D^, so that even if a bad event is exponentially unlikely 
in any particular part of the system, it may be likely to occur somewhere. To resolve this issue, one might try to 
use the Lovasz local lemma in some way: it might be possible to show that the event that the entanglement entropy 
of some given interval [i. j] was small was independent of the event that the entanglement entropy of some other 
interval [i r ,j r ] was small if |i — i '|, |j — j'\ are sufficiently large. Or, more simply stated: perhaps if a bad event occurs 
locally, one might resample those isometries and leave the other isometries unchanged. Perhaps another approach 
to avoiding having bad events occur somewhere is to reduce the amount of randomness: rather than choosing all 
isometries independently at random, one might instead take all isometries IF at a given level to be the same and 
sample that isometry at random, independently for each level, and similarly take all V at a given level to be the same. 
This approach has the downside that it complicates the calculations of the entanglement entropy. For example, if we 
consider S 2 (cr(k)fijj and *mod2 = 0 and jmod2 = 1, then exp(—..)) is now a fourth order polynomial in IF 
and IF, where IF is the isometry at the given level of the MERA. This leads to additional terms in the equation for 
i$ 2 , beyond those in Eq. (23). These extra terms likely do not change the result that we have found for the mutual 
information, however. 

One simple way to reduce the randomness without complicating the calculation of the asymptotic behavior of the 
entanglement entropy is to choose the isometries at each level of the MERA to repeat with some sequence. That is, if 
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we consider the isometry W in some given level of the MERA, if this isometry is a product of n isometries on pairs of 
sites, rather than choosing them all independently as done in this paper, and rather than choosing them all the same, 
one could choose them so that W \,..., W a are sampled independently for some a, and then have the sequence repeat 
so that Wi = Wi- a . In this way, if we calculate entanglement entropy of an interval short compared to a, we find the 
same Eqs. ( 23|26 ). We keep a the same at every level; then, a large interval of some length l would have additional 
terms present at the lower levels, but once one reached a level of the MERA of order log 2 (0> then we find the same 
Eqs. ( 23|26 ); note that it is at such a level that the dominant contributions to the entanglement entropy occur and 
so one finds the same results as in lemmas [3|4| for the asymptotic behavior. We leave these questions aside, however, 
until a proof of the correlation decay is given. 

As a final remark, one may modify the state by changing the recursion relations ( 36|37 ) by replacing the factor 
2 L ~ k by (2 L ~ k ) K for an exponent k. Having done this, for any k < 1, for sufficiently small e, one can take L arbitrarily 
large (i.e., L is no longer restricted to be of order 1/e) and have log(-Dfc), log(D' k ) roughly proportional to 2 L ~ k (the 
factor (2 L ~ k ) K becomes negligibly small). In this manner, it seems likely that the resulting state will combine a 
volume law for entanglement entropy with almost exponentially decaying correlation functions (correlation between 
two regions separated by l sites proportional to exp(—^/const.) for some constant). Generalizing this to higher 
dimensional MERA states[TS], we conjecture that one can obtain MERA states in d spatial dimensions with volume 
law entanglement and correlations decaying as exp(—^ K /const.) for any k < d (in particular, for d = 2 it seems that 
one can obtain super-exponential correlation decay and volume law entanglement). 
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